1 Introduction

Brain tumors are a significant health concern, affecting a considerable number of people worldwide.

Understanding the patterns and correlations between different variables in brain tumor datasets can help in early diagnosis, treatment planning, and improving patient outcomes.

2 Problem Statement

In the field of oncology, understanding the factors influencing brain tumor outcomes is critical for optimizing treatment strategies and patient care. Despite advancements in medical research, there remains a need to comprehensively analyze brain tumor datasets to uncover significant patterns and correlations among various variables. This analysis not only enhances our understanding of tumor behavior but also informs clinical decision-making processes.

3 Objectives

The primary objective of this project is to analyze the brain tumor dataset to identify key patterns and correlations between different variables.

Specifically, the project aims to:

- Describe the dataset

- Visualize key patterns

- Identify correlations

4 Methodology

  1. Data Preparation
  2. Exploratory Data Analysis
  3. Correlation Analysis
  4. Visualization and Reporting

5 Library Importation

5.1 Loading Necessary Libraries

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(plotly)     
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout
library(dplyr)
library(caret)      
## Loading required package: lattice
## 
## Attaching package: 'caret'
## 
## The following object is masked from 'package:purrr':
## 
##     lift
library(knitr)      
library(DT)
library(data.table)
## 
## Attaching package: 'data.table'
## 
## The following objects are masked from 'package:lubridate':
## 
##     hour, isoweek, mday, minute, month, quarter, second, wday, week,
##     yday, year
## 
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last
## 
## The following object is masked from 'package:purrr':
## 
##     transpose

6 Data Preparation

6.1 Loading the Dataset

data <- read.csv("C:\\Users\\user\\Desktop\\MouseWithoutBorders\\BrainTumor.csv")

6.2 First 5 rows

head(data,n=5)
##   Patient.ID Age Gender   Tumor.Type Tumor.Grade Tumor.Location
## 1          1  45   Male Glioblastoma          IV   Frontal lobe
## 2          2  55 Female   Meningioma           I  Parietal lobe
## 3          3  60   Male  Astrocytoma         III Occipital lobe
## 4          4  50 Female Glioblastoma          IV  Temporal lobe
## 5          5  65   Male  Astrocytoma          II   Frontal lobe
##                     Treatment   Treatment.Outcome Time.to.Recurrence..months.
## 1                     Surgery    Partial response                          10
## 2                     Surgery   Complete response                          NA
## 3      Surgery + Chemotherapy Progressive disease                          14
## 4 Surgery + Radiation therapy   Complete response                          NA
## 5 Surgery + Radiation therapy    Partial response                          24
##   Recurrence.Site Survival.Time..months.
## 1   Temporal lobe                     18
## 2            <NA>                     36
## 3    Frontal lobe                     22
## 4            <NA>                     12
## 5    Frontal lobe                     48

6.3 Missing Values

## Number of missing values in the dataset 
sum(is.na(data))
## [1] 1124
## checking the columns with missing values 
colSums(is.na(data))
##                  Patient.ID                         Age 
##                           0                           0 
##                      Gender                  Tumor.Type 
##                           0                           0 
##                 Tumor.Grade              Tumor.Location 
##                           0                           0 
##                   Treatment           Treatment.Outcome 
##                           0                           0 
## Time.to.Recurrence..months.             Recurrence.Site 
##                         562                         562 
##      Survival.Time..months. 
##                           0

6.4 Handling Missing Values

numeric_columns = data %>% 
  select(where(is.numeric))
character_columns = data %>% 
  select(where(is.character))

## Handling missing values in numeric variables
data <- data %>% 
  mutate(across(where(is.numeric), ~ ifelse(is.na(.), -1, .)))


## Handling missing values in character variables 
data <- data %>% 
  mutate(across(where(is.character), ~ ifelse(is.na(.), "unknown", .)))

6.5 Checking if Missing Values still Exist

colSums(is.na(data))
##                  Patient.ID                         Age 
##                           0                           0 
##                      Gender                  Tumor.Type 
##                           0                           0 
##                 Tumor.Grade              Tumor.Location 
##                           0                           0 
##                   Treatment           Treatment.Outcome 
##                           0                           0 
## Time.to.Recurrence..months.             Recurrence.Site 
##                           0                           0 
##      Survival.Time..months. 
##                           0

7 Exploratory Data Analysis

7.1 Statistical Summary

datatable(summary(data))

7.2 Variable Distributions

7.2.1 Age Distribution

age_distro <- plot_ly(data, x=~Age, type="histogram", width=500, height=400)

age_distro <- age_distro %>% 
  layout(
    title = "Age Distribution",
    xaxis = list(
      title = "Age"
    ),
    yaxis = list(
      title = "Frequency"
    )
  )
age_distro

The histogram of age distribution illustrates the frequency of various ages within the dataset.

7.2.1.1 Key Insights

  • Age Range: The data spans ages from approximately 30 to 70.

  • Peak Frequency: The highest frequency of individuals is around the age of 55, indicating that this age group is the most represented in the dataset.

  • Distribution Shape: The distribution shows a relatively normal (bell-curve) shape with a central peak and tapering frequencies at the extremes. This suggests a balanced distribution around middle age.

This age distribution insight can be useful in understanding the demographic characteristics of the dataset, aiding in more targeted analyses or interventions based on age-related factors.

7.2.2 Gender Distribution

gender_count <- table(data$Gender)
gender_count <- as.data.frame(gender_count)
colnames(gender_count) <- c("Gender", "Count")


gender_distro <- plot_ly(gender_count, x = ~Gender, y = ~Count, type = 'bar',width=500, height = 200) %>%
  layout(
    title = list(
      text = "Gender Distribution",
      font = list(size=15)),
    xaxis = list(title = list(
                   text = "Gender",
                   font =list(size=10))),
    yaxis = list(title =list(
                          text = "Count",
                          font = list(size=10))))
gender_count
##   Gender Count
## 1 Female  1007
## 2   Male   993
gender_distro

7.2.2.1 Key Insights

Based on the gender distribution analysis of the dataset, I observe the following counts:

  • Female: 1007

  • Male: 993

The gender distribution is almost equal, with a slight predominance of females over males. Specifically, females make up approximately 50.34% of the dataset, while males account for 49.66%. This near parity suggests that any gender-specific analysis derived from this data will be balanced and representative of both groups.

7.2.3 Tumor Type Distribution

type_count <- table(data$Tumor.Type)
type_df <- as.data.frame(type_count)
colnames(type_df) <- c("Type","Count")

type_fig <- plot_ly(type_df, x=~Type,y=~Count, type = "bar",width = 500, height = 300) %>% 
  layout(
    title = list(
      text = "Tumor Type Distribution",
      font = list(size=15)
      ),
    xaxis = list(
      title = "Tumor Type",
      font = list(size=9),
      tickfont = list(size = 8)
    ),
    yaxis = list(
      title = "Count",
      font = list(size = 9)
    )
    )
  type_df
##           Type Count
## 1  Astrocytoma   653
## 2 Glioblastoma   637
## 3   Meningioma   710
  type_fig

Based on the provided counts for different types of cases in the dataset, here are the insights:

  • Astrocytoma: 653 cases

  • Glioblastoma: 637 cases

  • Meningioma: 710 cases

7.2.3.1 Key Insights

The dataset comprises three types of conditions with the following distribution:

  • Meningioma is the most common, with 710 cases, accounting for approximately 36.6% of the total.

  • Astrocytoma follows closely, with 653 cases, making up around 33.7%.

  • Glioblastoma has the least representation but still significant, with 637 cases, representing about 29.7%.

This distribution indicates a fairly balanced representation across these three conditions, with Meningioma being slightly more prevalent. Such insights can help prioritize research or resource allocation based on the frequency of these conditions.

7.2.4 Gender-Based Distribution of Tumor Types

type_gender_count <- table(data$Tumor.Type, data$Gender)
type_gender_df <- as.data.frame(type_gender_count)
colnames(type_gender_df) <- c("Type", "Gender", "Count")

# Create Plotly grouped bar chart
type_gender_fig <- plot_ly(type_gender_df, x = ~Type, y = ~Count, color = ~Gender, type = "bar", width = 500, height = 300) %>% 
  layout(
    title = list(
      text = "Tumor Type Distribution by Gender",
      font = list(size = 15)
    ),
    xaxis = list(
      title = "Tumor Type",
      font = list(size = 9),
      tickfont = list(size = 8)
    ),
    yaxis = list(
      title = "Count",
      font = list(size = 9)
    )
  )
type_gender_df
##           Type Gender Count
## 1  Astrocytoma Female   332
## 2 Glioblastoma Female   297
## 3   Meningioma Female   378
## 4  Astrocytoma   Male   321
## 5 Glioblastoma   Male   340
## 6   Meningioma   Male   332
type_gender_fig
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

7.2.4.1 Key Insights:

  1. Astrocytoma:

    • Females: 332 cases (50.8% of total Astrocytoma cases)

    • Males: 321 cases (49.2% of total Astrocytoma cases)

    • Insight: The distribution of Astrocytoma is nearly balanced between genders, with a slight predominance in females.

  2. Glioblastoma:

    • Females: 297 cases (46.6% of total Glioblastoma cases)

    • Males: 340 cases (53.4% of total Glioblastoma cases)

    • Insight: Glioblastoma shows a slight predominance in males.

  3. Meningioma:

    • Females: 378 cases (53.2% of total Meningioma cases)

    • Males: 332 cases (46.8% of total Meningioma cases)

    • Insight: Meningioma is more common in females compared to males.

7.2.4.2 Overall Gender Distribution:

  • Females: 1007 cases in total

  • Males: 993 cases in total

7.2.4.3 Summary:

The gender distribution across the different types of conditions in the dataset shows:

  • Astrocytoma is fairly balanced between genders.

  • Glioblastoma is slightly more common in males.

  • Meningioma is more common in females.

These insights can help in understanding the gender-specific prevalence of these conditions, which may be crucial for targeted research, treatment planning, and resource allocation.

7.2.5 Tumor Location Distribution

location_count <- table(data$Tumor.Location)
location_df <- as.data.frame(location_count)
colnames(location_df) <- c("Location","Count")

location_fig <- plot_ly(location_df, x=~Location,y=~Count, type = "bar",width = 500, height = 300) %>% 
  layout(
    title = list(
      text = "Tumor Location Distribution",
      font = list(size=15)
      ),
    xaxis = list(
      title = "Tumor Location",
      font = list(size=9),
      tickfont = list(size = 8)
    ),
    yaxis = list(
      title = "Count",
      font = list(size = 9)
    )
    )
location_df
##         Location Count
## 1   Frontal lobe   515
## 2 Occipital lobe   485
## 3  Parietal lobe   503
## 4  Temporal lobe   497
location_fig

7.2.5.1 Key Insights:

  1. Frontal Lobe:

    • Count: 515 cases

    • Insight: The frontal lobe is the most common location for tumors in this dataset, accounting for approximately 26.2% of all cases.

  2. Occipital Lobe:

    • Count: 485 cases

    • Insight: The occipital lobe has the fewest cases among the four lobes, representing around 24.7% of all cases.

  3. Parietal Lobe:

    • Count: 503 cases

    • Insight: Tumors in the parietal lobe make up about 25.6% of the dataset.

  4. Temporal Lobe:

    • Count: 497 cases

    • Insight: The temporal lobe accounts for approximately 25.3% of all cases.

7.2.5.2 Summary:

The distribution of tumors across different brain lobes in your dataset is relatively balanced, with the frontal lobe being the most common site and the occipital lobe the least common. This nearly even distribution suggests that while there is a slight predominance of tumors in the frontal lobe, the occurrence of tumors is fairly evenly spread across the four lobes.

These insights are crucial for understanding the anatomical distribution of tumors, which can inform medical research, diagnosis strategies, and treatment planning.

7.2.6 Gender-Based Distribution of Tumor Location

location_gender_count <- table(data$Tumor.Location, data$Gender)
location_gender_df <- as.data.frame(location_gender_count)
colnames(location_gender_df) <- c("Location", "Gender", "Count")

# Create Plotly grouped bar chart
location_gender_fig <- plot_ly(location_gender_df, x = ~Location, y = ~Count, color = ~Gender, type = "bar", width = 500, height = 300) %>% 
  layout(
    title = list(
      text = "Tumor Location Distribution by Gender",
      font = list(size = 15)
    ),
    xaxis = list(
      title = "Tumor Location",
      font = list(size = 9),
      tickfont = list(size = 8)
    ),
    yaxis = list(
      title = "Count",
      font = list(size = 9)
    )
  )
location_gender_df
##         Location Gender Count
## 1   Frontal lobe Female   198
## 2 Occipital lobe Female   368
## 3  Parietal lobe Female   321
## 4  Temporal lobe Female   120
## 5   Frontal lobe   Male   317
## 6 Occipital lobe   Male   117
## 7  Parietal lobe   Male   182
## 8  Temporal lobe   Male   377
location_gender_fig
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

7.2.6.1 Key Insights

  1. Frontal Lobe:

    • Females: 198 cases (38.4% of total Frontal lobe cases)

    • Males: 317 cases (61.6% of total Frontal lobe cases)

    • Insight: The frontal lobe tumors are more prevalent in males compared to females.

  2. Occipital Lobe:

    • Females: 368 cases (75.9% of total Occipital lobe cases)

    • Males: 117 cases (24.1% of total Occipital lobe cases)

    • Insight: The occipital lobe tumors are significantly more common in females.

  3. Parietal Lobe:

    • Females: 321 cases (63.8% of total Parietal lobe cases)

    • Males: 182 cases (36.2% of total Parietal lobe cases)

    • Insight: The parietal lobe tumors are more common in females.

  4. Temporal Lobe:

    • Females: 120 cases (24.1% of total Temporal lobe cases)

    • Males: 377 cases (75.9% of total Temporal lobe cases)

    • Insight: The temporal lobe tumors are more prevalent in males.

7.2.6.2 Summary

The distribution of tumors across different brain lobes and genders shows distinct patterns:

  • Frontal Lobe: More common in males.

  • Occipital Lobe: Significantly more common in females.

  • Parietal Lobe: More common in females.

  • Temporal Lobe: More common in males.

These gender-specific insights can help in understanding the anatomical and demographic characteristics of tumors, which is crucial for personalized medical research, targeted treatment strategies, and resource allocation.

7.2.7 Tumor Treatment Distribution by Type

treatment_type_count <- table(data$Tumor.Type, data$Treatment)
treatment_type_df <- as.data.frame(treatment_type_count)
colnames(treatment_type_df) <- c("Type", "Treatment", "Count")

# Create Plotly grouped bar chart
treatment_type_fig <- plot_ly(treatment_type_df, x = ~Treatment, y = ~Count, color = ~Type, type = "bar", width = 700, height = 300) %>% 
  layout(
    title = list(
      text = "Tumor Treatment Distribution by Type",
      font = list(size = 15)
    ),
    xaxis = list(
      title = "Tumor Treatment",
      font = list(size = 9),
      tickfont = list(size = 10)
    ),
    yaxis = list(
      title = "Count",
      font = list(size = 9)
    )
  )
treatment_type_df
##            Type                   Treatment Count
## 1   Astrocytoma                Chemotherapy    52
## 2  Glioblastoma                Chemotherapy    21
## 3    Meningioma                Chemotherapy    50
## 4   Astrocytoma    Chemotherapy + Radiation     0
## 5  Glioblastoma    Chemotherapy + Radiation     2
## 6    Meningioma    Chemotherapy + Radiation     0
## 7   Astrocytoma                   Radiation    21
## 8  Glioblastoma                   Radiation     7
## 9    Meningioma                   Radiation    45
## 10  Astrocytoma                     Surgery    45
## 11 Glioblastoma                     Surgery    15
## 12   Meningioma                     Surgery    79
## 13  Astrocytoma      Surgery + Chemotherapy   223
## 14 Glioblastoma      Surgery + Chemotherapy   341
## 15   Meningioma      Surgery + Chemotherapy   215
## 16  Astrocytoma         Surgery + Radiation   311
## 17 Glioblastoma         Surgery + Radiation   250
## 18   Meningioma         Surgery + Radiation   321
## 19  Astrocytoma Surgery + Radiation therapy     1
## 20 Glioblastoma Surgery + Radiation therapy     1
## 21   Meningioma Surgery + Radiation therapy     0
treatment_type_fig

7.2.7.1 Key Insights

  1. Chemotherapy:

    • Astrocytoma: 52 cases

    • Glioblastoma: 21 cases

    • Meningioma: 50 cases

    • Insight: Chemotherapy is used relatively evenly across the three types, with slightly more cases in Astrocytoma and Meningioma.

  2. Chemotherapy + Radiation:

    • Astrocytoma: 0 cases

    • Glioblastoma: 2 cases

    • Meningioma: 0 cases

    • Insight: The combination of chemotherapy and radiation is rarely used, only appearing in a couple of Glioblastoma cases.

  3. Radiation:

    • Astrocytoma: 21 cases

    • Glioblastoma: 7 cases

    • Meningioma: 45 cases

    • Insight: Radiation is most commonly used for Meningioma, followed by Astrocytoma and then Glioblastoma.

  4. Surgery:

    • Astrocytoma: 45 cases

    • Insight: Surgery data is only provided for Astrocytoma, indicating it is a relatively common treatment for this type.

7.2.7.2 Summary

The distribution of treatments across tumor types shows:

  • Chemotherapy is commonly used across all three tumor types.

  • Chemotherapy + Radiation is rarely used, particularly only for a small number of Glioblastoma cases.

  • Radiation is primarily used for Meningioma, with fewer cases in Astrocytoma and Glioblastoma.

  • Surgery is noted only for Astrocytoma, indicating it is a common treatment for this type, but there is no data for the other tumor types regarding surgery.

These insights highlight the preferred treatment modalities for different tumor types, which can be essential for clinical decision-making, treatment planning, and understanding the therapeutic landscape.

7.2.8 Tumor Treatment Distribution by Outcome

treatment_outcome_count <- table(data$Treatment.Outcome, data$Treatment)
treatment_outcome_df <- as.data.frame(treatment_outcome_count)
colnames(treatment_outcome_df) <- c("Outcome", "Treatment", "Count")

# Create Plotly grouped bar chart
treatment_outcome_fig <- plot_ly(treatment_outcome_df, x = ~Treatment, y = ~Count, color = ~Outcome, type = "bar", width = 700, height = 300) %>% 
  layout(
    title = list(
      text = "Tumor Treatment Distribution by Outcome",
      font = list(size = 15)
    ),
    xaxis = list(
      title = "Tumor Treatment",
      font = list(size = 9),
      tickfont = list(size = 10)
    ),
    yaxis = list(
      title = "Count",
      font = list(size = 9)
    )
  )
treatment_outcome_df
##                Outcome                   Treatment Count
## 1    Complete response                Chemotherapy    31
## 2     Partial response                Chemotherapy     8
## 3  Progressive disease                Chemotherapy    30
## 4       Stable disease                Chemotherapy    54
## 5    Complete response    Chemotherapy + Radiation     0
## 6     Partial response    Chemotherapy + Radiation     0
## 7  Progressive disease    Chemotherapy + Radiation     2
## 8       Stable disease    Chemotherapy + Radiation     0
## 9    Complete response                   Radiation     9
## 10    Partial response                   Radiation     4
## 11 Progressive disease                   Radiation    32
## 12      Stable disease                   Radiation    28
## 13   Complete response                     Surgery    78
## 14    Partial response                     Surgery    14
## 15 Progressive disease                     Surgery    39
## 16      Stable disease                     Surgery     8
## 17   Complete response      Surgery + Chemotherapy   201
## 18    Partial response      Surgery + Chemotherapy   167
## 19 Progressive disease      Surgery + Chemotherapy   123
## 20      Stable disease      Surgery + Chemotherapy   288
## 21   Complete response         Surgery + Radiation   241
## 22    Partial response         Surgery + Radiation   159
## 23 Progressive disease         Surgery + Radiation   356
## 24      Stable disease         Surgery + Radiation   126
## 25   Complete response Surgery + Radiation therapy     1
## 26    Partial response Surgery + Radiation therapy     1
## 27 Progressive disease Surgery + Radiation therapy     0
## 28      Stable disease Surgery + Radiation therapy     0
treatment_outcome_fig

7.2.8.1 Key Insights

  1. Chemotherapy:

    • Complete response: 31 cases

    • Partial response: 8 cases

    • Progressive disease: 30 cases

    • Stable disease: 54 cases

    • Insight: Chemotherapy has the highest number of stable disease outcomes (54), with complete response and progressive disease outcomes being almost equal.

  2. Chemotherapy + Radiation:

    • Complete response: 0 cases

    • Partial response: 0 cases

    • Progressive disease: 2 cases

    • Stable disease: 0 cases

    • Insight: The combination of chemotherapy and radiation shows very few cases, with only 2 instances of progressive disease.

  3. Radiation:

    • Complete response: 9 cases

    • Partial response: 4 cases

    • Insight: Radiation shows a small number of complete and partial response outcomes, indicating some effectiveness in these categories.

7.2.8.2 Summary

The distribution of treatment outcomes shows:

  • Chemotherapy is widely used with diverse outcomes, having the most cases of stable disease.

  • Chemotherapy + Radiation is rarely used, with only a couple of cases showing progressive disease.

  • Radiation is used less frequently but has notable instances of complete and partial responses.

These insights can help understand the effectiveness of different treatments, guide therapeutic decisions, and optimize treatment plans for better patient outcomes.

7.2.9 Average Survival Month by Treatment

average_survival_treatment <- data %>% 
  group_by(Treatment) %>% 
  summarize(average_survival = round(mean(data$Survival.Time..months.,na.rm=TRUE)))
average_survival_treatment
## # A tibble: 7 × 2
##   Treatment                   average_survival
##   <chr>                                  <dbl>
## 1 Chemotherapy                              34
## 2 Chemotherapy + Radiation                  34
## 3 Radiation                                 34
## 4 Surgery                                   34
## 5 Surgery + Chemotherapy                    34
## 6 Surgery + Radiation                       34
## 7 Surgery + Radiation therapy               34
survival_average_df <- as.data.frame(average_survival_treatment)
colnames(survival_average_df) <- c("Treatment", "Average Survival Month")

survival_average_fig <- plot_ly(survival_average_df, x = ~Treatment, y = ~`Average Survival Month`,type = "bar", width = 700, height = 300) %>% 
  layout(
    title = list(
      text = "Average Survival Months by Treatment",
      font = list(size = 15)
    ),
    xaxis = list(
      title = "Tumor Treatment",
      font = list(size = 9),
      tickfont = list(size = 10)
    ),
    yaxis = list(
      title = "Average Survival Months",
      font = list(size = 9)
    )
  )
survival_average_df
##                     Treatment Average Survival Month
## 1                Chemotherapy                     34
## 2    Chemotherapy + Radiation                     34
## 3                   Radiation                     34
## 4                     Surgery                     34
## 5      Surgery + Chemotherapy                     34
## 6         Surgery + Radiation                     34
## 7 Surgery + Radiation therapy                     34
survival_average_fig

7.2.9.1 Key Insights

Based on the provided average survival months for each treatment type:

  1. Chemotherapy, Chemotherapy + Radiation, Radiation, Surgery, Surgery + Chemotherapy, Surgery + Radiation, Surgery + Radiation therapy: All show an average survival month of 34.

7.2.9.2 Summary

These insights suggest that, based on the data provided, there is no variation in average survival months across different treatment types. This may imply that, in the context of average survival, these treatments are considered equally effective, at least as represented by the average survival month metric.

7.2.10 Correlation Analysis

correlation_matrix <- cor(numeric_columns)
correlation_matrix
##                               Patient.ID          Age
## Patient.ID                   1.000000000 -0.009721277
## Age                         -0.009721277  1.000000000
## Time.to.Recurrence..months.           NA           NA
## Survival.Time..months.      -0.016479283  0.066281516
##                             Time.to.Recurrence..months. Survival.Time..months.
## Patient.ID                                           NA            -0.01647928
## Age                                                  NA             0.06628152
## Time.to.Recurrence..months.                           1                     NA
## Survival.Time..months.                               NA             1.00000000

7.2.10.1 Key Insight

Based on the correlation matrix :

  1. Patient.ID and Age: There is a very weak negative correlation (approximately -0.0097) between Patient ID and Age. This suggests that there is no meaningful relationship between patient identification numbers and age in this dataset.

  2. Age and Survival Time (months): There is a very weak positive correlation (approximately 0.0663) between Age and Survival Time. This implies that older age may slightly correlate with longer survival time, although the correlation is quite weak.

  3. Time to Recurrence (months): The correlation coefficient is not available (NA) for Time to Recurrence with other variables, indicating insufficient data or variability in this particular dataset column.

  4. Survival Time (months): There is no correlation reported (NA) between Survival Time and Patient ID or Time to Recurrence. The correlation with Age is weakly positive (approximately 0.0663), suggesting a slight tendency for older patients to have longer survival times.

7.2.10.2 Interpretation

  • The weak correlations observed suggest that age may have a slight influence on survival time, albeit not strongly. The lack of correlation with Patient ID and Time to Recurrence indicates that these variables may not be directly associated with survival outcomes in this dataset.

  • Further analysis or additional variables may be needed to better understand the factors influencing survival times or recurrence rates in the context of brain tumor patients.